-
Notifications
You must be signed in to change notification settings - Fork 368
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix deleteat! and subset! performance #3249
Conversation
Interesting. Have you checked with more columns and with a lower percentage of dropped rows? I would expect the |
You are right! 🧠 The threshold value I assessed empirically is less than 5% observations when it is better (it probably also depends on number of columns, but I wanted to have something relatively simple). I have proposed an adaptive algorithm switching between two approaches as needed. |
Cool. Can you check when there are many columns? That's a use case that we care about. |
Here is an example: 100 columns, 10^6 rows. Tested with 5.5% rows to drop (so a bit above 5% threshold). Setup:
This PR:
Current release:
I did some more tests on even wider tables and it seems that a more precise threshold is 6% on my laptop, so I changed it to that value. |
OK, great! |
Thank you! |
Explanation in https://discourse.julialang.org/t/learning-to-benchmark-and-find-the-best-function-to-select-a-subset-of-a-dataframe/91704/12.
Benchmarks
This PR
1.4.4 release